Large-Scale Knowledge Graph Identification using PSL Extended Abstract

نویسندگان

  • Jay Pujara
  • Lise Getoor
  • William W. Cohen
چکیده

The web is a vast repository of knowledge, but automatically extracting that knowledge, at scale, has proven to be a formidable challenge. A number of recent evaluation efforts have focused on automatic knowledge base population (Ji, Grishman, and Dang 2011; Artiles and Mayfield 2012), and many well-known broad domain and open information extraction systems exist, including the Never-Ending Language Learning (NELL) project (Carlson et al. 2010), OpenIE (Etzioni et al. 2008), and efforts at Google (Pasca et al. 2006), which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph (Singhal 2012). Unfortunately, most web-scale extraction systems do not take advantage of the rich dependencies found in the knowledge graph; instead approaches consider extractions independently, relying on simple heuristics to enforce consistency. Recent work demonstrates that reasoning jointly is a promising approach to improving the knowledge graph. (Jiang, Lowd, and Dou 2012) choose candidate facts for inclusion in a knowledge base with a joint approach using Markov Logic Networks (MLNs) (Richardson and Domingos 2006). Jiang et al. provide a straightforward codification of ontological relations and candidate facts found in a knowledge base as rules in first-order logic and use MLNs to formulate a probabilistic model. However, due to the combinatorial explosion of Boolean assignments to random variables, inference and learning in MLNs pose intractable optimization problems. Jiang et al. limit the candidate facts they consider, restricting their dataset to a 2-hop neighborhood around each fact, and use a sampling approach to inference, estimating marginals using MC-SAT. Despite these approximations, their work demonstrate the utility of joint reasoning in comparison to a baseline that considers each fact independently. Our work builds on the foundation of Jiang, Lowd, and Dou by providing a richer model for knowledge bases and vastly improving scalability. Our method transforms the noisy output of an information extraction system, a we de-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-Scale Knowledge Graph Identification using PSL

Building a web-scale knowledge graph, which captures information about entities and the relationships between them, represents a formidable challenge. While many largescale information extraction systems operate on web corpora, the candidate facts they produce are noisy and incomplete. To remove noise and infer missing information in the knowledge graph, we propose knowledge graph identificatio...

متن کامل

Knowledge Graph Identification

Large-scale information processing systems are able to extract massive collections of interrelated facts, but unfortunately transforming these candidate facts into useful knowledge is a formidable challenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a knowledge graph. The extractions form an extraction graph and we refer to the t...

متن کامل

LPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring

Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...

متن کامل

Graph Summarization in Annotated Data Using Probabilistic Soft Logic

Annotation graphs, made available through the Linked Data initiative and Semantic Web, have significant scientific value. However, their increasing complexity makes it difficult to fully exploit this value. Graph summaries, which group similar entities and relations for a more abstract view on the data, can help alleviate this problem, but new methods for graph summarization are needed that han...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013